Introduction (20 points)

Scientific question:

WISP-1 is a protein from the CCN protein family which is known to act as an oncogene in most type of cancers to promote cancer cell proliferation, progression, metastasis, and invasion. What might be the structural basis of WISP-1-induced tumor growth?

Background

WNT1 inducible signaling pathway protein 1 (WISP1), a member 4 protein from the CCN family (CCN4), is a secreted matricellular protein found in the extracellular matrix (ECM) and affects different cell responses like other ECM proteins. It plays a part in cellular functions such as differentiation, proliferation, migration, and survival. and was demonstrated to its overexpression has oncogenic properties in most cancers. An in vitro study found that recombinant WISP-1 treatment increased cell proliferation in a breast cancer context (1). Since WISP-1 interacts with other proteins in the ECM to signal downstream pathways, inhibiting the interactions might limit cellular pathway signaling for cellular responses like cell proliferation and halt its oncogenic functions, making it a potential target of cancer therapeutic.

A study on colon cancer found that WISP-1 upregulation is associated with many oncogenic properties of cancerous cells such as proliferation and invasion through increasing apoptosis and blocking cell cycle checkpoints (2). The protein β-catenin was identified as an upstream protein and a direct binding partner of WISP1 in colorectal cancer cells. Together, they mediate the functions of WISP1 through promoting cell proliferation and invasion which initiates colon cancer tumorigenesis. Collectively, these evidence indicates that inhibiting overactive Wnt/β-catenin signaling might be critical in preventing or slowing the pathogenesis of colon cancers (3). Knowing evidence suggests that β-catenin binds WISP-1 to facilitate colon cancer cell proliferation, colony formation, and invasion, inhibiting the interaction between β-catenin and WISP-1 might serve as a therapeutic strategy. The interface of WISP-1 and β-catenin hence warrant further investigation in order to determine competent inhibitors for the interaction. Since no experimental structure of WISP-1 currently exists, generating a model via homology modeling for further interaction analysis (ie moelcular docking) between β-catenin and WISP-1 might provide insights into what consitutes feasible inhibitors in blockage of the interaction.

Sources:

  1. https://www.nature.com/articles/srep08686
  2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5226551/
  3. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0092317

Where the data is sourced from

The protein sequence of WISP1 is sourced from the NCBI protein database.

Scientific hypothesis

If WISP-1 can bind protein partners to activate signaling pathways like the Wnt/β-catenin signaling pathway to modulate specific cell functions such as tumor cell proliferation, then there are structural domains on WISP-1 that engage in specific interactions with its binding partner β-catenin to promote tumorigenesis in colon cancer.

Description of analysis

How the data was downloaded

Loading in Packages (15 points)

Packages loaded

Biopython:

BioPandas

Pandas

Numpy

Modeller

NglView

Prody

Note on activation after installation for modeller:

you will need to go to home page --> registration --> sign the License Agreement to obtain a license key from the website and input the license in the correct document file following the error message upon initial attempt of activation. Email containing the license key will arrive within couple of hours when using University of California San Diego for authentification.

Performing Bioinformatics Analysis (20 points)

Description of each bioinformatic method, include data types read in and how the method works
  1. BLAST
  1. Homology Modelling

1. BLAST

5 points for use of a built-in Bioconductor or Biopython function (or some other tool that was discussed in class like NumPy or SciPy), and description of what the function reads in and what it returns.

The Biopython function used: Bio.SeqIO.read()

What the function reads in: a handle containing only 1 record and format

What the function returns: the content in the sequence fasta file read in

Blast code source: http://prody.csb.pitt.edu/tutorials/structure_analysis/blastpdb.html

Template selected being '5nb8' with a 53.52% sequence identity and an e-value of 2.45677e-19.

2. Homology Modeling

Code Source: https://salilab.org/modeller/tutorial/basic.html

Note

For the alignment step performed below, you would need to manually create the document 'WISP1.ali' on an empty txt file in the same directory as the notebook (have been cleared by the professor to do so as modeller document did not contain information of how to generate the file).

what should go in the file:

Scroll to the last few lines from log file generated above which gives a summary of all the models built

Screen%20Shot%202022-06-03%20at%2012.53.40%20PM.png

model selected for the rest of the analysis will be WISP1.B99990001.pdb

Visualizing the structure

model - WISP1.B99990001.pdb should already be generated in the same directory as your notebook

Plotting The Results (15 points)

Descipriton of data analysis methods:

  1. Model Evalutaion
  1. Protein interaction site prediction
  1. Interactions of WISP-1 with a Specific Protein

Note on Data analysis method

The alternative method to using online web server for investigating molecular docking is to build our own machine learning models for prediction, which is out of scope for this class (had been cleared by the professer to use online web servers for analysis of molecular docking prediction as it is better suited for my hypothesis)

1. Model Evalution

Method: After constructing the model from homology modeling, submit the model PDB file to PDBsum to generate a Ramachandran plot analysis to evaluate the validity of the protein structure. The result (in a URL) should arrive within 48 hours after submitting work to the server. You would need the passward sent to you in the email you left on the server to access the PDBsum data of your structure.

The online webserver PDBsum: http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html

Here is the result page that the URL should direct you to:

Screen%20Shot%202022-06-05%20at%207.29.24%20AM.png

Click on PROCHECK to obtain the Ramachandran plot analysis. Here is what you should see as the Main Ramachandran plot.

Screen%20Shot%202022-06-05%20at%207.33.33%20AM.png

For our homology modeling structure, 81.9% of residues are in the most favoured regions, 13.2% residues are in the additional allowed regions, 3.5% residues are in the generously allowed regions, with only 1.3% of residues being in the disallowed regions. Now we would proceed to do interaction analysis using this model we generated.

2. General protein interaction site prediction

Method:

After evaluating the model, submit the model PDB file to protein-protein interaction web server for analysis of general binding sites.

The online web server meta-PPISP: https://pipe.rcc.fsu.edu/meta-ppisp/

After submitting the PDB file to the web server, the result should arrive within a couple of hours. After obtaining the result output sent to the email you left on the web server, which containing interface prediction data in pdb file, upload the PDB file to the Jupyter notebook and analyze results

Code source: http://rasbt.github.io/biopandas/tutorials/Working_with_PDB_Structures_in_DataFrames/

After obtaining the residues that are likely to be at the interaction interface, visualize them in PyMOL, and place the image in the notebook

The residues of interests based on the data analysis above is colored in violet.

wisp1_colored.png

3. Interactions of WISP-1 with a Specific Protein

Programs involved:

HawkDock Server and PyMOL

Description:

For protein-protein interaction analysis between the constructed WISP-1 model and specific proteins, the web server HawkDock (http://cadd.zju.edu.cn/hawkdock/) might be able to give more insights. You can submit to the server the WISP-1 model file and the PDB file of an interacting protein of interest. The protein of interests investigated for this project is β-catenin, which was identified as a binding partner of WISP1.

Method:

Submit the WISP-1 model constructed and the PDB structure file of β-catenin (PDB ID: 3SL9) to the server. You can put either one as ligand or receptor. The result should arrive within 72 hours and can be retrieved from a URL sent to you if you leave an email when submitting the job to the server.

Here is the result page that the URL should direct you to:

Screen%20Shot%202022-06-05%20at%201.03.52%20AM.png

Click download top 10 predictions. We will be analyzing model 1 from the top 10 prediction.

Display within PyMOL the model.1.pdb file you just downloaded from the HawkDock Server. Click S at the bottom right bar to display the sequences of the polypeptide chains of both WISP-1 and β-catenin. Select each structure and put them in different color by selecting C on the right to modify the color they are displayed in. WISP-1 is in orange and β-catenin is in cyan.

Beta-cat%20w:%20WISP-1.png

Since we want to find out what residues on WISP-1 is interacting with other proteins, click H on the WISP-1 selection to hide all of the WISP-1 structure. Now type into the top bar the following code: show sticks, byres all within 5 of _( insert in the blank the name you set your β-catenin selection to).

For example, my code reads: show sticks, byres all within 5 of beta-catenin.

This code will show all the residues in sticks within WISP-1 that are within the distance of 5 angstrom from β-catenin will be shown.

code source: https://pymolwiki.org/index.php/Selection_Algebra

beta-wisp1%20interaction.png

Zooming in, it can be observed that 359Y, 360P, 361D, 362F, 363S, 364E, 366A, and 367N are predicted to be WISP-1 residues that are within proximity of β-catenin and might be engaging in interaction. 367N of this model specifically is likely to be engaging in hydrogen bonds with β-catenin residues 40K and 42E (indicated by yellow dashed line)

To find the hydrogen bonding, select WISP-1 selection —> A —> Find —> polar contact —> any other atom. Yellow dashed lines will be used to display hydrogen bonds.

beta-wisp1-h%20bond.png

The same analysis was performed for model 2 from the top 10 predictions.

357E, 358S, 359Y, 360P, 361D, 362F, 363S, 364E, 366A, and 367N are predicted to be WISP-1 residues that are within proximity of β-catenin and might be engaging in interaction.

364E and 367N of this model specifically is likely to be engaging in hydrogen bonds with β-catenin residues 603K, 605E, and 607Y.

beta-wisp1-model%202.png

Analyzing the Results (15 points)

The protein structure for the WISP-1 (CCN4) protein, which currently has no existing experimental structure stored in the PDB, was generated using homology modeling with existing template 5nb8 which has a 53.52% sequence identity and an e-value of 2.45677e-19. 5nb8 is a structure of vWC domain from CCN3, another member from the CCN protein family. Using a Ramachandran plot, the model was evaluated to have 81.9% of residues are in the most favoured regions, 13.2% residues are in the additional allowed regions, 3.5% residues are in the generously allowed regions, with only 1.3% of residues being in the disallowed regions. Now we would proceed to do interaction analysis using this model we generated. Given that the most closely related template structure available in the PDB has a sequence identity of 53.52%, although 81.9% differ from the 90% residues usually expected to be in the core of favored region of the protein for a good protein structure (1), the model we generated could be a reliable start point for further prediction evaluation.

Then the WISP-1 structure is analyzed via different structure-based online server for protein and protein interaction. For a general analysis, it was found that the amino acids ranges from 336W to 366A, exlcuding 365I and 367N are predicted to be likely residues at the protein interaction interface. For a more specific analysis with β-catenin structure (PDB ID: 3SL9), a series of interaction prediction is generated with different structural models for specific site interactions. Model 1 and model 2 from the top 10 predictions were used for analysis. In model 1, WISP-1 residues 359Y, 360P, 361D, 362F, 363S, 364E, 366A, and 367N are predicted to be within proximity of β-catenin and might be engaging in interaction. Model 2 has prediction of 357E, 358S, 359Y, 360P, 361D, 362F, 363S, 364E, 366A, and 367N being WISP-1 residues interacting with β-catenin. This matches the general prediction analysis in that the residues from 336W to 366A might be involved at an interaction interface with another protein. The model also predicted specific hydrogen bonds between WISP-1 and β-catenin. Intermolecular hydrogen bonds being present demonstrates the binding complexes are reliable, given that hydrogen bonds are known to stabilize protein-ligand complexes (2) and that the docking results are significant (3). With the homology modeling structure helping to elucidate residues potentially involved in molecular docking between WISP-1 and its binding partner β-catenin which together they mediate tumor formation and progression, this might provide insights to what might consitute suitable inhibitors to adequately block their specific interactions to slow or prevent tumorgenesis.

Sources cite:

  1. https://www.nature.com/articles/srep43830
  2. https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-1-13
  3. https://onlinelibrary.wiley.com/doi/10.1002/prot.24104

Code Formatting Requirements (15 points)

Explanation of what global variable and local variable is: